Varying camera self-determination based on subject motion

ABSTRACT

In a method and digital camera, an initial set of evaluation images are captured. A plurality of characteristics of the initial set of evaluation images are assessed to provide a first assessment. The characteristics include subject motion between the initial set of evaluation images. When the subject motion is in excess of a predetermined threshold, a final capture state of the camera is set responsive to the first assessment. When the subject motion is less than the predetermined threshold, the evaluation images are analyzed to provide analysis results and the final capture state of the camera is set responsive to the first assessment and the analysis results.

CROSS REFERENCE TO RELATED APPLICATIONS

This is a continuation-in-part of application Ser. No. 11/399,076 filedon Apr. 6, 2006 entitled “VARYING CAMERA SELF-DETERMINATION BASED ONSUBJECT MOTION” by Bruce H. Pillman et al.

Reference is made to commonly assigned, co-pending U.S. patentapplication Ser. No. ______, (Attorney Docket No. 91686RLW), filed Apr.6, 2006, entitled: CAMERA AND METHOD WITH ADDITIONAL EVALUATION IMAGECAPTURE BASED ON SCENE BRIGHTNESS CHANGES, in the names of Bruce H.Pillman and Jiebo Luo.

FIELD OF THE INVENTION

The invention relates to photography and photographic equipment andmethods and more particularly relates to varying cameraself-determination based on subject motion.

BACKGROUND OF THE INVENTION

In capturing a scene with a camera, many parameters affect the qualityand usefulness of the captured image. In addition to controlling overallexposure, exposure time affects motion blur, f/number affects depth offield, and so forth. In many cameras, all or some of these parameterscan be controlled and are conveniently referred to as camera settings.

Methods for controlling exposure and focus are well known in bothfilm-based and electronic cameras. However, the level of intelligence inthese systems is limited by resource and time constraints in the camera.In many cases, knowing the type of scene being captured can lead easilyto improved selection of capture parameters. For example, knowing ascene is a portrait allows the camera to select a wider aperture, tominimize depth of field. Knowing a scene is a sports/action scene allowsthe camera to automatically limit exposure time to control motion blurand adjust gain (exposure index) and aperture accordingly. Because thisknowledge is useful in guiding simple exposure control systems, manyfilm, video, and digital still cameras include a number of scene modesthat can be selected by the user. These scene modes are essentiallycollections of parameter settings, which direct the camera to optimizeparameters, given the user's selection of scene type.

The use of scene modes is limited in several ways. One limitation isthat the user must select a scene mode for it to be effective, which isoften inconvenient, even if the user understands the utility and usageof the scene modes.

A second limitation is that scene modes tend to oversimplify thepossible kinds of scenes being captured. For example, a common scenemode is “portrait”, optimized for capturing images of people. Anothercommon scene mode is “snow”, optimized to capture a subject against abackground of snow, with different parameters. If a user wishes tocapture a portrait against a snowy background, they must choose eitherportrait or snow, but they cannot combine aspects of each. Many othercombinations exist, and creating scene modes for the varyingcombinations is cumbersome at best.

In another example, a backlit scene can be very much like a scene with asnowy background, in that subject matter is surrounded by backgroundwith a higher brightness. Few users are likely to understand the conceptof a backlit scene and realize it has crucial similarity to a “snow”scene. A camera developer wishing to help users with backlit scenes willprobably have to add a scene mode for backlit scenes, even though it maybe identical to the snow scene mode.

Both of these scenarios illustrate the problems of describingphotographic scenes in way accessible to a casual user. The number ofscene modes required expands greatly and becomes difficult to navigate.The proliferation of scene modes ends up exacerbating the problem thatmany users find scene modes excessively complex.

Attempts to automate the selection of a scene mode have been made. Forexample, United States Published Patent Application US 2003/0007076 A1,“Image Processing Apparatus and Image-Quality Control Method,” NoriyukiOkisu et al, assigned to Minolta Co., Ltd., published Jan. 9, 2003,teaches a method for automatic selection of scene mode based on focusdata, scene brightness, and focal length. Similarly, U.S. Pat. No.6,301,440, “System and Method for Automatically Setting ImageAcquisition Parameters,” Rudolf M. Bolle et al, assigned toInternational Business Machines Corp., issued Oct. 9, 2001, teaches amethod for automatic selection of a scene mode and use of a photographicexpert unit to automatically set parameters for image capture. Both ofthese methods disclose ways to use information from evaluation imagesand other data to determine a scene mode. The scene mode then is used toselect a set of capture parameters from several sets of captureparameters that are optimized for each scene mode.

A limitation on automated methods is that such methods tend to becomputationally intensive relative to the simpler methods. Cameras tendto be relatively limited in computing resources, in order to reducecost, cut energy drain, and the like. This has resulted in noticeablelag between shutter trip and image capture in some cameras. Such lag ishighly undesirable when a subject to be photographed is in motion. Onesolution to the problem of lag is avoidance of highly time consumingcomputations. This leads back again to the use of modes.

It would thus be desirable to provide improved cameras and methods, inwhich camera settings are automatically determined and the aboveshortcomings are at least partially mitigated.

SUMMARY OF THE INVENTION

The invention is defined by the claims. The invention, in broaderaspects, provides methods and cameras, in which a camera capture stateis self-determined by the camera. An initial set of evaluation imagesare captured and characteristics of the initial set of evaluation imagesare assessed to provide a first assessment. Those characteristicsinclude subject motion between the initial set of evaluation images.When the subject motion is in excess of a predetermined threshold, afinal capture state of the camera is set responsive to the firstassessment. When the subject motion is less than the predeterminedthreshold the evaluation images are further analyzed to provide analysisresults and the final capture state is set responsive to the analyzing.In a particular embodiment of the invention, when said subject motion isless than the predetermined threshold one or more additional evaluationimages are captured after the capturing of the initial set of evaluationimages and their characteristics are determined to provide a secondassessment. The second assessment is analyzed to provide analysisresults and the final capture state is responsive to all of the analysisresults.

It is an advantageous effect of the invention that improved cameras andmethods are provided, which allow camera settings to be automaticallydetermined in a computationally intensive manner and also allow subjectmotion to be accommodated.

BRIEF DESCRIPTION OF THE DRAWINGS

The above-mentioned and other features and objects of this invention andthe manner of attaining them will become more apparent and the inventionitself will be better understood by reference to the followingdescription of an embodiment of the invention taken in conjunction withthe accompanying figures wherein:

FIG. 1 is a block diagram showing the major components of a digitalcamera.

FIG. 2 is a semi-diagrammatical rear view of the camera of FIG. 1.

FIG. 3 is a diagrammatical front view of the filter wheel of the cameraof FIG. 1.

FIG. 4 is a diagrammatical front view of the diaphragm of the camera ofFIG. 1.

FIG. 5 is a diagram of the grid of regions formed by the sensors of therangefinder of the camera of FIG. 1.

FIG. 6 is a flow chart of the steps of a method of evaluating subjectmotion in determining camera settings for image capture.

FIG. 7 is a flow chart of the steps of a method of consideringbrightness changes in determining camera settings for image capture.

FIG. 8 is a detailed flow diagram of an embodiment incorporating themethods of both FIG. 6 and FIG. 7.

FIG. 9 is a detailed flow chart of a modification of the method of FIG.8, which is limited to the method of FIG. 6.

FIG. 10 is a detailed flow chart of a modification of the method of FIG.8, which is limited to the method of FIG. 7.

FIG. 11 is a detailed flow diagram of complex feature analysis in themethods of FIGS. 8 and 10.

FIG. 12 is a detailed flow diagram of complex feature analysis in themethod of FIG. 9.

FIG. 13 illustrates different scene compositions that are subject todifferent treatment by the camera of FIG. 1.

FIG. 14 illustrates block-based motion analysis in a modification of thecamera of FIG. 1.

FIGS. 15A-15B are diagrammatical views illustrating image data used inthe motion estimation to compute costs associated with different motionoffsets in the camera of FIG. 1.

FIGS. 16A-16B are diagrammatical views of the summation of data withinrows to form vectors used in the motion analysis of the camera of FIG.1.

FIGS. 17A-17B are the same views as FIGS. 16A-16B of the summation ofdata within columns to form vectors used for the motion analysis of thecamera of FIG. 1.

FIG. 18 is a diagrammatical view of an embodiment of the system.

DETAILED DESCRIPTION OF THE INVENTION

The following discussion refers to both still cameras and video cameras.It will be understood that the respective terms are inclusive of bothdedicated still and video cameras and of combination still/videocameras, as used for the respective still or video capture function.

The term “capture state” is used herein to refer collectively to aplurality of camera settings that are or can be used together during aparticular picture taking event to moderate how a light image iscaptured. Each setting is variable and affects one or morecharacteristics of an archival image captured and stored by the camera.Examples of settings include: lens aperture, lens focal length, shutterspeed, flash condition, focus parameters, exposure parameters, whitebalance, image resolution, sensor gain, color saturation, sharpeningfilter parameters, and the like. Settings available with an individualcamera vary depending upon camera characteristics. A capture state mayor may not fully determine settings for a particular image capture. Forexample, a capture state can define flash output prior to picture takingor can define flash output as being met when light returned from aphotographed subject reaches a particular level. Similarly, a capturestate can define settings, which will be applied in the absence of auser override of one or more of those settings. For example, focus canbe set by the user to remain at infinity, during a particular picturetaking session. Likewise, a capture state can define one or morealternate settings based upon a later determined parameter, such as useractivation of full flash or fill flash.

The term “archival image” is used herein to refer to a digital imagestored in memory and accessible to the user following a capture event.An archival image is distinguished from other non-archival electronicimages produced during capture of a light image of a scene. Suchnon-archival images include earlier images in the imaging chain leadingto the archival image, such as the initial analog electronic imagecaptured by the image sensor of the camera and the initial digital imageproduced by digitizing the initial analog image. In those cases, thenon-archival images and the resulting archival image are all producedfrom the same light image. Another type of non-archival images is imagesused in viewfinding, setting exposure and focus, and the like. Thesenon-archival images may be shown to the user on a viewfinder or thelike, but are not made available for ordinary use subsequent to capture.These non-archival images can be automatically deleted by reuse of thememory used for storing them.

The terms “evaluation image” and “final image” are also used herein.Evaluation images are captured during camera set-up. Final images arecaptured following camera set-up. Final images are archival images.Evaluation images can be archival or non-archival, depending on cameraset-up. Evaluation images can have the same resolution as archivalimages or can have a lower resolution. Depending upon the type of imagesensor, it may be convenient to capture each evaluation image as a highresolution image, followed by irreversible conversion to a sampled, lowresolution subset of the original image. The low resolution subset canbe provided using the method described in commonly-assigned U.S. Pat.No. 5,164,831 “ELECTRONIC STILL CAMERA PROVIDING MULTI-FORMAT STORAGE OFFULL AND REDUCED RESOLUTION IMAGES” to Kuchta, et al. Two electroniccapture units can be present in the camera, with one used as theevaluation image capture unit and the other used as the archival imagecapture unit. An example of a suitable digital camera having two suchelectronic capture units is described in U.S. Pat. No. 5,926,218,entitled “ELECTRONIC CAMERA WITH DUAL RESOLUTION SENSORS” to Smith.

The camera can be a still camera, a video camera, or combine bothcapabilities. With a still camera, it is typically convenient to treatevaluation images as non-archival, on the assumption that the userintended to capture only the final image and the evaluation images aresurplusage. With a video camera, it is typically convenient to treatboth evaluation and final images as archival, on the assumption that theuser intended to capture all available images. Individual cameras can belimited to a particular set-up or treatment of evaluation images can bevaried automatically or as a user-selectable option. More complexarrangements are also possible, such as treating different evaluationimages in a capture sequence differently. For convenience, thediscussion here is limited to embodiments, in which evaluation imagesfrom a still-capture event are all non-archival and evaluation imagesfrom a video-capture event are all archival. It will be understood thatlike considerations apply to other embodiments.

In the following description, some features are described as “software”or “software programs”. Those skilled in the art will recognize that theequivalent of such software can also be readily constructed in hardware.Because image manipulation algorithms and systems are well known, thepresent description emphasizes algorithms and features forming part of,or cooperating more directly with, the method. Other aspects of suchalgorithms and apparatus, and hardware and/or software for producing andotherwise processing the image signals involved therewith, notspecifically shown or described herein may be selected from suchsystems, algorithms, components, and elements known in the art. Giventhe description as set forth in the following specification, allsoftware implementation thereof is conventional and within the ordinaryskill in such arts.

The invention is inclusive of combinations of the embodiments describedherein. References to “a particular embodiment” and the like refer tofeatures that are present in at least one embodiment of the invention.Separate references to “an embodiment” or “particular embodiments” orthe like do not necessarily refer to the same embodiment or embodiments;however, such embodiments are not mutually exclusive, unless soindicated or as are readily apparent to one of skill in the art. The useof singular and/or plural in referring to the “method” or “methods” andthe like is not limiting.

Referring to FIGS. 1-5, in a particular embodiment, the camera 10 has abody 12 that provides structural support and protection for othercomponents. The body 12 can be varied to meet requirements of aparticular use and style considerations. An electronic image captureunit 14, which is mounted in the body 12, has a taking lens 16 and anelectronic array image sensor 18 aligned with the taking lens 16. Lightfrom a subject scene propagates along an optical path 20 through thetaking lens 16 and strikes the image sensor 18 producing an analogelectronic image.

The type of image sensor used may vary, but it is highly preferred thatthe image sensor be one of the several solid-state image sensorsavailable. For example, the image sensor can be a charge-coupled device(CCD), a CMOS sensor (CMOS), or charge injection device (CID). Theelectronic image capture unit includes other components associated withthe image sensor. A typical image sensor is accompanied by separatecomponents that act as clock drivers (also referred to herein as atiming generator), analog signal processor (ASP) and analog-to-digitalconverter/amplifier (A/D converter). Such components can alsoincorporated in a single unit with the image sensor. For example, CMOSimage sensors are manufactured with a process that allows othercomponents to be integrated onto the same semiconductor die.

The electronic image capture unit 14 captures an image with three ormore color channels. It is currently preferred that a single imagesensor be used along with a color filter array, however, multiplemonochromatic image sensors and filters can be used. Suitable filtersare well known to those of skill in the art, and, in some cases areincorporated with the image sensor to provide an integral component.

Those skilled in the art will recognize that some procedures describedherein in relation to digital images having multiple color channels canalso be limited to one or more of the channels, but less than all of thechannels. Suitability of this approach can be determined heuristically.Those skilled in the art will also recognize that for digital imageprocessing steps described herein as replacing original pixel valueswith processed pixel values is functionally equivalent to describing thesame processing steps as generating a new digital image with theprocessed pixel values while retaining the original pixel values.

The electrical signal from each pixel of the image sensor 18 is relatedto both the intensity of the light reaching the pixel and the length oftime the pixel is allowed to accumulate or integrate the signal fromincoming light. This time is called the integration time or exposuretime.

Integration time is controlled by a shutter 22, that is switchablebetween an open state and a closed state. The shutter 22 can bemechanical or electromechanical or can be provided as a logical functionof the hardware and software of the electronic image capture unit. Forexample, some types of image sensors allow the integration time to becontrolled electronically by resetting the image sensor and then readingout the image sensor some time later. When using a CCD, electroniccontrol of the integration time of the image sensor 18 can be providedby shifting the accumulated charge under a light shielded registerprovided at a non-photosensitive region. This can be a full frame as ina frame transfer device CCD or a horizontal line in an interlinetransfer device CCD. Suitable devices and procedures are well known tothose of skill in the art. Thus, the timing generator 24 can provide away to control when the image sensor 18 is actively recording the image.In the camera 10 of FIG. 1, the shutter 22 and the timing generator 24jointly determine the integration time.

The combination of overall light intensity and integration time iscalled exposure. Equivalent exposures can be achieved by variouscombinations of light intensity and integration time. Although theexposures are equivalent, a particular exposure combination of lightintensity and integration time may be preferred over other equivalentexposures for capturing an image of a given scene.

Although FIG. 1 shows several exposure controlling elements, someembodiments may not include one or more of these elements, or there maybe alternative mechanisms of controlling exposure. The camera can havealternative features to those illustrated. For example, shutters thatalso function as diaphragms are well-known to those of skill in the art.

In the illustrated camera, a filter assembly 26 and diaphragm 28 modifythe light intensity at the sensor 18. Each is adjustable. The diaphragm28 controls the intensity of light reaching the image sensor 20 using amechanical aperture (not shown) to block light in the optical path 20.The size of the aperture can be continuously adjustable, stepped, orotherwise varied. As an alternative, the diaphragm 28 can be emplaceablein and removable from the optical path 20. Filter assembly 26 can bevaried likewise. For example, filter assembly 26 can include a set ofdifferent neutral density filters that can be rotated or otherwise movedinto the optical path. In FIG. 3, an example of the filter assembly 26has a filter wheel 30 with different neutral density filters 32 that arerotatable into the optical path (illustrated by a cross 20 a). Thefilter wheel 30 is directly driven by a driver 34, such as a steppermotor. In FIG. 4, an example of a diaphragm 28, in the form of a set ofWaterman stops in a diaphragm wheel 38 is also illustrated. Thedifferently sized apertures 40 of the diaphragm are rotatable into theoptical path (illustrated by circle 20 a) by a driver 42, such as astepper motor. (Stepper motors are illustrated in FIGS. 3-4, as pinionsmeshed with the respective wheels. Directions of rotation are indicatedby double-headed arrows.) Other suitable filter assemblies anddiaphragms are well known to those of skill in the art.

The camera 10 has an optical system 44 that includes the taking lens 16and can also include components (not shown) of a viewfinder 46. Theoptical system 14 can take many different forms. For example, the takinglens can be fully separate from an optical viewfinder or from a digitalviewfinder that consists of an eyepiece provided over an internaldisplay. The viewfinder lens unit and taking lens can also share one ormore components. Details of these and other alternative optical systemsare well known to those of skill in the art. For convenience, theoptical system 44 is generally discussed hereafter in relation to anembodiment having a digital viewfinder and separate on-camera display 48that can be also be used to view a scene, as is commonly done withdigital cameras.

The taking lens 16 can be simple, such as having a single focal lengthand manual focusing or a fixed focus, but this is not preferred. In thecamera shown in FIG. 1, the taking lens 16 is a motorized zoom lens inwhich a mobile element or elements are driven, relative to one or moreother lens elements, by a zoom control-driver 50. This allows theeffective focal length of the lens to be changed. Digital zooming(digital enlargement of a digital image) can also be used instead of orin combination with optical zooming. The taking lens can also includeelements or groups (not shown) that can be inserted or removed from theoptical path, by a macro control-driver 52 so as to provide a macro(close focus) capability.

The taking lens unit 16 of the camera 10 is also preferablyautofocusing. For example, an autofocusing system can provide focusingpassive or active autofocus or a combination of the two. Referring toFIG. 1, one of more focus elements (not separately shown) of the takinglens are driven, by a focus control-driver 54 to focus rays from aparticular distance on the image sensor 20. The autofocusing system hasa rangefinder 56 that has one or more sensing elements that send asignal to the control unit, which does a focus analysis of the signaland then operates focus driver 54 to move the focusable element orelements (not separately illustrated) of the taking lens 16.

Referring now to FIG. 5, in particular embodiments, a rangefinder 56 ofthe camera 10 divides a transmitted image 92 of the scene into a grid 91of regions 90 (illustrated as boxes in FIG. 5) and senses distances, foreach region 90, to within the limits of one of several distance ranges.A wide variety of suitable multiple sensor rangefinders are known tothose of skill in the art. For example, U.S. Pat. No. 5,440,369discloses such a rangefinder. The rangefinder 56 then provides thedistance range for each region 90 to the system controller 66, whichthen determines a subject-background pattern of the scene, as discussedbelow. The functions of the rangefinder can alternatively be provided assoftware and hardware functions of the capture unit and control unit(discussed below).

The camera 10 includes a brightness sensor 58. In FIG. 1, the brightnesssensor 58 is shown as a one or more separate components. The brightnesssensor 58 can also be provided as a logical function of hardware andsoftware of the capture unit 14. The brightness sensor 58 has a driverthat operates a single sensor or multiple sensors and provides at leastone signal representing scene light intensity for use in the analysis ofexposure of the scene. As an option, this signal can also provide colorbalance information. An example, of a suitable brightness sensor thatcan be used to provide one or both of scene illumination and color valueand is separate from the electronic image capture unit 14, is disclosedin U.S. Pat. No. 4,887,121.

The camera of FIG. 1 includes a flash unit 60, which has anelectronically controlled illuminator such as a xenon flash tube 61(labelled “FLASH” in FIG. 1). A flash sensor 62 can optionally beprovided, which outputs a signal responsive to the light sensed from thescene during archival image capture or by means of a preflash prior toarchival image capture. The flash sensor signal is used in controllingthe output of the flash unit by means of a dedicated flash controller 63or as a function of the control unit. Alternatively, flash output can befixed or varied based upon other information, such as focus distance.The function of flash sensor 62 and brightness sensor 58 can be combinedin a single component or logical function of the capture unit andcontrol unit.

The image sensor 18 receives a light image (the scene image) andconverts the light image to an analog electronic image. The electronicimage sensor 18 is operated by an image sensor driver. The electronicimage is ultimately transmitted to the image display 48, which isoperated by an image display controller-driver 64. Different types ofimage display 48 can be used. For example, the display 48 can be aliquid crystal display (“LCD”) or an organic electroluminescent display(“OLED”).

The control unit 65 controls or adjusts the exposure regulating elementsand other camera components, facilitates transfer of images and othersignals, and performs processing related to the images. The control unit65 shown in FIG. 1 includes a system controller 66, timing generator 24,analog signal processor 80, an A/D converter 69, digital signalprocessor 70, and memory 72 a-72 d. Suitable components for the controlsystem are known to those of skill in the art. These components can beprovided as enumerated or by a single physical device or by a largernumber of separate components. The controller 66 can take the form of anappropriately configured microcomputer, such as an embeddedmicroprocessor having RAM for data manipulation and general programexecution. Modifications of the control unit 65 are practical, such asthose described elsewhere herein.

The timing generator 24 supplies control signals for all electroniccomponents in timing relationship. Calibration values for the individualcamera 14 are stored in a calibration memory (not separatelyillustrated), such as an EEPROM, and supplied to the controller 66.Components of a user interface (discussed below) are connected to thecontrol unit 65 and function by means of a combination of softwareprograms executed on the system controller 66. The control unit 65 alsooperates the drivers and memories, including the zoom driver 50, focusdriver 54, macro driver 52, display drivers 64 and other drivers (notshown) for the shutter 22, diaphragm 28, filter assembly 26, andviewfinder and status displays 74,76.

The camera 10 can include other components to provide informationsupplemental to captured image information. An example of such asupplemental information component 78 is the orientation sensorillustrated in FIG. 1. Other examples include a real time clock, motionsensors, a global positioning system receiver, and a keypad or otherentry device for entry of user captions or other information.

It will be understood that the circuits shown and described can bemodified in a variety of ways well known to those of skill in the art.It will also be understood that the various features described here interms of physical circuits can be alternatively provided as firmware orsoftware functions or a combination of the two. Likewise, componentsillustrated as separate units herein may be conveniently combined orshared. Multiple components can be provided in distributed locations.

The initial electronic image from the image sensor is amplified andconverted from analog to digital by the analog signal processor 68 andanalog to digital (A/D) converter-amplifier 80 to a digital electronicimage, which is then processed in the digital signal processor 70 usingDSP memory 72 a and stored in system memory 72 b and/or removable memory72 c. Signal lines, illustrated as a data bus 81, electronically connectthe image sensor 18, system controller 66, digital processor 70, theimage display 48, and other electronic components; and provide a pathwayfor address and data signals.

“Memory” refers to one or more suitably sized logical units of physicalmemory provided in semiconductor memory or magnetic memory, or the like.Memory 72 a-72 d can each be any type of random access memory. Forexample, memory can be an internal memory, such as a Flash EPROM memory,or alternately a removable memory, such as a Compact Flash card, or acombination of both. Removable memory 72 c can be provided for archivalimage storage. Removable memory can be of any type, such as a CompactFlash (CF) or Secure Digital (SD) type card inserted into a socket 82and connected to the system controller 66 via memory card interface 83.Other types of storage that are utilized include without limitationPC-Cards or MultiMedia Cards (MMC).

The system controller 66 and digital signal processor 70 can becontrolled by software stored in the same physical memory that is usedfor image storage, but it is preferred that the processor 70 andcontroller 66 are controlled by firmware stored in dedicated memory 72d, for example, in a ROM or EPROM firmware memory. Separate dedicatedunits of memory can also be provided to support other functions. Thememory on which captured images are stored can be fixed in the camera 10or removable or a combination of both. The type of memory used and themanner of information storage, such as optical or magnetic orelectronic, is not critical. For example, removable memory can be afloppy disc, a CD, a DVD, a tape cassette, or flash memory card orstick. The removable memory can be utilized for transfer of imagerecords to and from the camera in digital form or those image recordscan be transmitted as electronic signals.

Digital signal processor 70 is one of two processors or controllers inthis embodiment, in addition to system controller 66. Although thispartitioning of camera functional control among multiple controllers andprocessors is typical, these controllers or processors are combined invarious ways without affecting the functional operation of the cameraand the application of the present invention. These controllers orprocessors can comprise one or more digital signal processor devices,microcontrollers, programmable logic devices, or other digital logiccircuits. Although a combination of such controllers or processors hasbeen described, it should be apparent that one controller or processorcan perform all of the needed functions. All of these variations canperform the same function.

In the illustrated embodiment, digital signal processor 70 manipulatesthe digital image data in its memory 72 a according to a softwareprogram permanently stored in program memory 72 d and copied to memory72 b for execution during image capture. Digital signal processor 70executes the software necessary for practicing image processing. Thedigital image can also be modified in the same manner as in otherdigital cameras to enhance images. For example, the image can beprocessed by the digital signal processor to provide interpolation andedge enhancement. Digital processing of an electronic archival image caninclude modifications related to file transfer, such as, JPEGcompression, and file formatting. Metadata can also be provided in amanner well known to those of skill in the art.

System controller 66 controls the overall operation of the camera basedon a software program stored in program memory 72 d, which can includeFlash EEPROM or other nonvolatile memory. This memory can also be usedto store image sensor calibration data, user setting selections andother data which must be preserved when the camera is turned off. Systemcontroller 66 controls the sequence of image capture by directing themacro control 52, flash control 63, focus control 54, zoom control 50,and other drivers of capture unit components as previously described,directing the timing generator 24 to operate the image sensor 18 andassociated elements, and directing digital signal processor 70 toprocess the captured image data. After an image is captured andprocessed, the final image file stored in system memory 72 b or DSPmemory 72 a, is transferred to a host computer via interface 84, storedon a removable memory card 72 c or other storage device, and displayedfor the user on image display 48. Host interface 84 provides ahigh-speed connection to a personal computer or other host computer fortransfer of image data for display, storage, manipulation or printing.This interface can be an IEEE1394 or USB2.0 serial interface or anyother suitable digital interface. The transfer of images, in the method,in digital form can be on physical media or as a transmitted electronicsignal.

In the illustrated camera 10, processed images are copied to a displaybuffer in system memory 72 b and continuously read out via video encoder86 to produce a video signal. This signal is processed by displaycontroller 64 and/or digital signal processor 70 and presented on imagedisplay 48 and can be output directly from the camera for display on anexternal monitor. The video images are archival if the camera is usedfor video capture and non-archival if used for viewfinding prior tostill archival image capture.

The camera has a user interface 88, which provides outputs to thephotographer and receives photographer inputs. The user interface 88includes one or more user input controls 93 (labelled “USER INPUTS” inFIG. 1) and image display 48. User input controls 93 can include ashutter release 94, a “zoom in/out” control 95 that controls the zoomingof the lens units, and other user controls 96. User input controls canbe provided in the form of a combination of buttons, rocker switches,joysticks, rotary dials, touch screens, and the like.

The user interface 88 can include one or more information displays 97 topresent camera information to the photographer, such as exposure level,exposures remaining, battery state, flash state, and the like. The imagedisplay can instead or additionally also be used to display non-imageinformation, such as camera settings. For example, a graphical userinterface (GUI) can be provided, including menus presenting optionselections and review modes for examining captured images. Both theimage display and a digital viewfinder display can provide the samefunctions and one or the other can be eliminated. The camera can includea speaker, which provides audio warnings instead of, or in addition to,visual warnings depicted on the information display, image display 88,or both. The components of the user interface are connected to thecontrol unit and function by means of a combination of software programsexecuted on the system controller 66.

Different types of image display 48 can be used. For example, the imagedisplay can be a liquid crystal display (“LCD”), a cathode ray tubedisplay, or an organic electroluminescent display (“OLED”). The imagedisplay 48 is preferably mounted on the camera body so as to be readilyviewable by the photographer.

As a part of showing an image on the image display, the camera canmodify the image for calibration to the particular display. For example,a transform can be provided that modifies each image to accommodate thedifferent capabilities in terms of gray scale, color gamut, and whitepoint of the display and the image sensor and other components of theelectronic capture unit. It is preferred that the display is selected soas to permit the entire image to be shown; however, more limiteddisplays can be used. In the latter case, the displaying of the imageincludes calibration that cuts out part of the image, or contrastlevels, or some other part of the information in the image.

It will also be understood that the camera herein is not limited to aparticular feature set, except as defined by the claims. For example,the camera can include any of a wide variety of features not discussedin detail herein, such as, detachable and interchangeable lenses. Thecamera can also be portable or fixed in position and can provide one ormore other functions related or unrelated to imaging. For example, thecamera can be a cell phone camera or can provide communication functionsin some other manner. Likewise, the camera can include computer hardwareand computerized equipment. The camera can include multiple captureunits.

For example, referring to FIG. 18, there is illustrated a camera in theform of a computer system 1110 and tethered capture unit. The camera canlikewise be a portable computer, kiosks, or other system for the captureand processing of digital images. The computer system 1110 includes amicroprocessor-based unit 1112 for receiving and processing softwareprograms and for performing other processing functions. Images are inputdirectly via a cable connection 1138 to the microprocessor-based unit1112 or via a wireless connection 1140 to the microprocessor-based unit1112.

A display 1114 is electrically connected to the microprocessor-basedunit 1112 for displaying user-related information associated with thesoftware, e.g., by means of a graphical user interface. A keyboard 1116is also connected to the microprocessor based unit 1112 for permitting auser to input information to the software. As an alternative to usingthe keyboard 1116 for input, a mouse 1118 may be used for moving aselector 1120 on the display 1114 and for selecting an item on which theselector 1120 overlays, as is well known in the art.

Removable memory, in any form, can be included and is illustrated as acompact disk-read only memory (CD-ROM) 1124, which can include softwareprograms, is inserted into the microprocessor based unit for providing ameans of inputting the software programs and other information to themicroprocessor based unit 1112. Multiple types of removal memory can beprovided (illustrated here by a floppy disk 1126) and data can bewritten to any suitable type of removable memory. Memory can be externaland accessible using a wired or wireless connection, either directly orvia a local or large area network, such as the Internet. Still further,the microprocessor-based unit 1112 may be programmed, as is well knownin the art, for storing software programs internally. A printer 1128 orother output device can also be connected to the microprocessor-basedunit 1112 for printing a hardcopy of the output from the computer system1110. The microprocessor-based unit 1112 can have a network connection1127, such as a telephone line or wireless link, to an external network,such as a local area network or the Internet. One or more of the devicesillustrated in FIG. 18 can be located remotely and can be connected viaa network. One or more of the devices can be connected wirelessly, suchas by an infrared or radio-frequency link, either directly or via anetwork.

The output device provides a final image that has been subject totransformations. The output device can be a printer or other outputdevice that provides a paper or other hard copy final image. The outputdevice can also be an output device that provides the final image as adigital file. The output device can also include combinations of output,such as a printed image and a digital file on a memory unit, such as aCD or DVD.

The microprocessor-based unit 1112 provides means for processing thedigital images to produce pleasing looking images on the intended outputdevice or media. The present invention can be used with a variety ofoutput devices that can include, but are not limited to, a digitalphotographic printer and soft copy display. The microprocessor-basedunit 1112 can be used to process digital images to make adjustments foroverall brightness, tone scale, image structure, etc. of digital imagesin a manner such that a pleasing looking image is produced by an imageoutput device.

In use, the camera is turned on and evaluation images are captured. Theevaluation images are available for display by a digital viewfinder orthe camera display for use in image composition. The evaluation imagesare captured in a continuous stream or sequence.

To take a picture with the camera, the shutter release is actuated bythe user and trips from a set state to an intermediate state, and thentrips to a released state. The separate stages are sometimes referred toas the “first stroke” and “second stroke”, respectively. Theintermediate state can be used, in a conventional manner, to lock in thesettings of the current final capture state of the camera.Alternatively, the intermediate state can be eliminated. This isconvenient for capture of video.

In the methods, following the initiation of evaluation image capture,evaluation images and other camera data is considered by the controlunit in determining the camera settings of a final image capture statefor use in archival image capture. Subject motion and brightness changesbetween evaluation images can either of both be considered. FIGS. 6 and7 present the general features of a method considering subject motionand a method considering brightness changes, respectively.

In the method of FIG. 6, an initial set of two or more evaluation imagesare captured (300) and a plurality of characteristics of that set areassessed (302) to provide a first assessment. It is highly preferred, tosave time, that the initial set be limited to a pair of evaluationimages and that the pair of images be successive images in the stream.The initial set of images can be at the beginning of the stream or at alater point in the stream during an iteration of the process. Thecharacteristics assessed include subject motion between the two or moreevaluation images of the initial set. Other characteristics assessedinclude brightness data and other information conventionally used forautofocus, autoexposure, and flash readying. These characteristics canbe determined in relation to one or more of the evaluation images. Thesubject motion determination necessarily requires multiple images.Following the assessing, the motion assessed is compared (304) to apredetermined threshold. When the motion is in excess of the threshold,a final capture state is set (306) based on the first assessment. Whenthe motion is not in excess of the threshold, a second assessment isconducted. In the second assessment, evaluation images are furtheranalyzed (308) to provide analysis results and the final capture stateis set (310) based on the analysis results. The threshold can be setheuristically. A simple example of a threshold is no subject motionfound in the first assessment.

The second assessment can utilize one or more additional evaluationimages of the sequence. It is currently preferred that the additionalimage or images are successive images and immediately follow the initialset, but a gap of unused evaluation images can exist between the initialset and the additional images. The number of additional images is amatter of convenience and processing constraints, in view of timerequirements. It is desirable that the final capture state be setwithout a noticeable delay in final image capture or with only a slightdelay.

In the second assessment, characteristics of the additional one or moreimages are determined. Both assessments are then analyzed and the finalcapture state is set responsive to the analyzing. This necessarilyconsumes more time than the first assessment alone. The final capturestate following the analyzing, in many cases, will differ from the finalcapture state earlier determined based on only the first assessment invalues of one or more of: focal length, focus distance, aperture,exposure time, and gain.

The characteristics determined in the second assessment can be the sameas those of the first assessment, including subject motion, or can vary.Additional characteristics that are more computationally intensive thanthose of the first assessment, can also be considered. The secondassessment can include consideration of depth of field and tonality, asdiscussed below in detail.

The term “tonality” is used herein to refer to the overall grey scale ortone scale of the densities of regions of an image with respect to theeffectiveness of the values in representing the grey scale or tone scaleof the subject of the image. A binary representation of a color subjecthas low tonality.

The characteristics in the first and second assessments are limited byprocessing constraints. The processing provided in the first assessmentis more limited than in the second assessment, but, if undue delay isnot incurred, one or both of the assessments can include more complexdeterminations such as determinations of semantic features, such aslocations, depth of field, and other features of faces.

In the method of FIG. 7, initial evaluation images are captured (312).The camera, is initially in a default state, which can be preset orbased upon currently measured parameters, such as detected scenebrightness. A change in scene brightness between two or more of theinitial evaluation images is computed (314). The brightness change haspossible values with magnitudes from zero or unmeasurable change to amaximum measurable by the camera. The scene brightness change iscompared (316) to a predetermined scene brightness range that isintermediate relative to the possible values of brightness change. Whenthe scene brightness change is outside the predetermined scenebrightness range, a scene-to-capture mismatch is computed (318). Thismismatch is an estimate that is based upon characteristics of the image(also referred to herein as “markers”) that are indicative of a failureof the camera to capture one or more aspects of the light image of thescene. The mismatch can be in the form of a metric.

The mismatch is compared (320) to a predetermined mismatch range. Whenthe mismatch is outside the mismatch range, the camera is shifted (322)to a second capture state and additional evaluation images are captured(324). When the scene brightness is in the scene brightness range or themismatch is in the mismatch range, the capture of additional evaluationimages is skipped. A final capture state is determined (326) using theavailable evaluation images and final images are captured (328) with thecamera in the final capture state.

In a particular embodiment, each evaluation image has associated depthof field information and corresponding distance range information. Inthat case, markers for one or both of depth of field problems andtonality accumulation can be evaluated. The computing of the mismatchfor depth of field problems assesses differences between the distancerange information and the depth of field information. Tonalityaccumulations are considered in relation to highlights (brightest pixelsin an image) and shadows (darkest pixels in the image). A tonalityaccumulation in a captured image is a zone of shadow or highlight havinga narrow or single step tone scale, rather than a broader multi-steptone scale characteristic of other parts of the image. Tonalityaccumulation is indicative of information loss relative to acorresponding light image of a scene and the available tone scale of aparticular capture system. The characteristics of tonality accumulationsare well known to those of skill in the art and can be readilydetermined heuristically for a particular camera.

When the brightness change is in the predetermined brightness range orwhen the mismatch is in a predetermined range, the camera is maintainedin an initial capture state during the capturing of all of theevaluation images. When the brightness change and mismatch are outsiderespective ranges, the camera is shifted to a second capture state priorto the capturing of one or more additional evaluation images. The secondcapture state is at least partially corrective of the mismatch. Forexample, the additional evaluation images can be focused to provide adepth of field that better matches distances to subject matterdetermined by the rangefinder. In another example, a second capturestate can change exposure to provide better tone scale in highlights orin shadows.

After the evaluation images are captured and analyzed, a final camerastate is determined using the set of evaluation images. Each of thecapture states includes settings of a plurality of: focal length,exposure time, focus distance, aperture, white balance adjustment, andflash state. One or more final images are then captured with the camerain the final camera state.

The steps leading to capture of the final images can be free of userintervention other than an initial actuation of evaluation image captureand a tripping signal actuating final image capture. Alternatively, thecamera can display an indication of the mismatch to the photographerprior to the setting of the final capture state and accept user inputdesignating one of a plurality of capture states as the final capturestate. The indications can be evaluation images captured when the camerawas in the second capture state. For example, the camera can display anindications of a capture state that would decrease tonalityaccumulations in shadows and indication of another capture state thatwould decrease tonality accumulations in highlights. Similarly, thecamera can display evaluation images captured with different depths offield.

FIG. 8 presents a detailed flowchart of a particular embodimentincorporating the methods of both FIG. 6 and FIG. 7. FIG. 9 presents amodification of the method of FIG. 8, in which subject motion is notconsidered.

FIG. 10 presents another modification of the method of FIG. 8, in whichbrightness changes are not considered.

In FIG. 8, the overall decision flow is essentially a continuous loop,from start block 100 to end block 198 and back to start 100, with anoccasional branch for capture of a final still image. In thisembodiment, evaluation images are captured in a continuous stream andare continuously analyzed in the evaluation cycle of FIG. 8, when thecamera is active and the user is composing the scene prior to actuatingthe shutter release. The evaluation images can be at a lower resolutionthan the final image.

Processing begins at the top of FIG. 8, at start block (100). Next,focus image data is acquired (103) and preview image data is acquired(105). In both cases, the image data is supplied by two or moreevaluation images. The focus data includes lens focus distanceinformation and a specialized image that can be analyzed for localcontrast (edge content). The focus data can be produced by performingedge enhancement of one or more of the evaluation images. The previewimage data is the image data of two or more of the evaluation images orsubsampled versions of those images. For convenience, in the followingdiscussion the preview image data is treated as being the respectiveevaluation images. It will be understood that like considerations applyto subsampled or otherwise modified images.

Following the acquisition of focus data, the autofocus (AF) analysisoperation is performed (110). Focus image data from the image sensor isfiltered with band pass and high pass filters to produce local contrastvalues. The local contrast values, along with the lens focus distance,are analyzed to provide an understanding of the subject matter distanceof one or more of the evaluation images. The focus image data caninclude information, such as lens focus distance and local contrastvalues, retained from previous iterations of the evaluation cycle. Lensfocus distance can be changed between cycles using the lens focuscontrol 54. Focus determination procedures using such information,sometimes referred to as a “through-focus” approach, are well known tothose of skill in the art. The result of this analysis is effectively arange map of best focus distance for different portions of the scene.

The focus image data can additionally or alternatively include rangeinformation from a rangefinder in addition to or instead of image sensorinformation. The range information provided by the rangefinder 56 of thecamera of FIG. 1 is in the form of a range map. Through focus andrangefinder approaches are only two options among many for acquiring amap of distances to different portions of the scene. Other approachescan also be used.

Display images for presentation (120) on the display are prepared fromthe evaluation images. One or more operations may be required forconversion of the evaluation images into display images. Conversionincludes such procedures as resizing, balancing, and color correctingthe image for display on the image display.

Subject motion analysis is also performed (115) on the evaluationimages. The current evaluation image is compared to the previousevaluation image, determining what subject motion has occurred betweenthe two images. Typical intentional camera movements are low frequency,no more than 1-2 Hz, while hand tremor commonly occurs at 2-10 Hz. Thus,low-pass temporal filtering can be applied to the motion estimates todistinguish deliberate motions from high frequency jitter. Manyprocedures are known for motion estimation.

U.S. Pat. No. 6,130,912 and U.S. Pat. No. 6,128,047 disclose the use ofintegral projection for motion estimation. A block-based motion estimateis disclosed in “Efficient Block Motion Estimation Using IntegralProjections”, K. Sauer and B. Schwartz, IEEE Trans. On Circuits andSystems for Video Technology, 6(5), 1996, pages 513-518. The integralprojections are within a block-matching framework and are subject to thelimitations of block-based techniques. The use of full image integralprojections in computing a global expansion of a block-based motionestimate is disclosed in “Real-time Digital Video Stabilization forMulti-media Applications”, K. Ratakonda, IEEE Int'l Symposium onCircuits and Systems, 1998, vol. 4, pages 69-72.

One procedure using of block-based motion analysis as illustrated inFIG. 14. An evaluation image 610 has a block of pixels 600 within it,defining a rectangular zone of interest within the evaluation image. Theprevious evaluation image 620 is searched for a block of pixels matchingthe block of pixels in block 600. In this example, the block of pixelsin image 620 that matches best is block 630. Accordingly, the vectorfrom the corner of block 600 to the corner of block 630 is the estimatedmotion vector for this block of pixels. This process is repeated formultiple blocks of pixels in evaluation image 610 and previousevaluation image 620, developing a set of motion estimates for differentregions of the scene. If block-based motion estimation is used, it isdesirable to implement techniques to reject blocks that likely providespurious motion estimates. Such techniques are known to those of skillin the art. If computational resources allow, even more complex motionanalysis, such as those involving segmentation of moving objects, can beused to advantage.

In a particular embodiment, motion estimation is based on integralprojection. This approach is relatively efficient. Block-basedtechniques, especially ones using blocks that are similar in size tothose used for video compression, can pick up finer motion than would beeasily detectable using projection techniques, but require morecomputing resources. Referring to FIGS. 16A-17B, horizontal and verticalimage projection vectors are formed by summing the image elements ineach column to form horizontal projection vectors, and summing theelements in each row to form vertical projection vectors.

In FIG. 16A, a captured evaluation image is shown broken into fourvertical bands 902. Pixels in each of these bands 902 are summed intoprojection vectors 903. FIG. 16B shows an expanded view of this process.The vertical projection vector 903 is formed by summing various datapoints 901 within the overall Y component image data for band 902. Inthe illustrated embodiment, only a subset of the image data is used whenforming the vertical projection vector. In FIG. 16B, only every fifthpixel of each row of the image data is included in the summation.Additionally, only every second row is considered in the summation andcreation of projection vector 903. As shown in FIG. 16A, severalvertical projection vectors 903 are formed from multiple bands of theimage 902. For simplicity, these bands do not overlap, though as thenumber of bands is increased, there can be an advantage to allowing someoverlap. During analysis, segments 905 of each projection vector areanalyzed. Dividing the evaluation image into bands and segments allowsmultiple motion estimates for each pair of evaluation images analyzed.

In FIG. 17A, a captured evaluation image is shown broken into threehorizontal bands 952. Pixels in each of these bands 952 are summed intoprojection vectors 953. FIG. 17B shows an expanded view of this process.The horizontal projection vector 953 is formed by summing various datapoints 951 within the overall Y component image data for band 952. Inthe illustrated embodiment, only a subset of the image data is used whenforming the horizontal projection vector. In FIG. 17B, only every fourthpixel of each column of the image data is included in the summation.Additionally, only every second column is considered in the summationand creation of projection vector 953. As shown in FIG. 17A, severalhorizontal projection vectors 953 are formed from multiple bands of theimage 952. For simplicity, these bands do not overlap, though as thenumber of bands is increased, there can be an advantage to allowing someoverlap. During analysis, segments 955 of each projection vector areanalyzed. Dividing the evaluation image into bands and segments allowsmultiple motion estimates for each pair of evaluation images analyzed.

Much of the burden of estimating motion via integral projections residesin the initial computation of the projection vectors. If necessary, thiscomplexity can be reduced in two ways. First, the number of elementscontributing to each projection sum can be reduced by subsampling asshown in FIGS. 16B and 17B. A second subsampling can be achieved byreducing the density of the projection vectors as shown in FIGS. 16B and17B. For example, when forming the horizontal projection vector,including only every other column in the projection vector. This type ofsubsampling reduces complexity even more because it also decreases thecomplexity of the subsequent matching step to find the best offset, butit comes at a cost of reduced resolution for motion estimates.

The subset of imaging data to be used for the horizontal and verticalprojection vectors can be selected heuristically, with the understandingthat reducing the number of pixels reduces the computational burden, butalso decreases accuracy. For accuracy, it is currently preferred thattotal subsampling reduce the number of samples by no more than a ratioof 4:1-6:1. Further, if resources are available, it is preferred to notsubsample at all in creating the projection vectors.

The use of multiple partial projection vectors rather than full imageprojection vectors reduces the effect of independently moving objectswithin images on the motion estimate. The number of partial projectionvectors in each direction need not be large for good results. Forexample, in a particular embodiment shown in FIGS. 16A and 17A, 12horizontal and 12 vertical motion estimates are obtained. That is,vertical motion estimates are obtained for three segments 905 of eachvertical projection vector 903. Similarly, horizontal motion estimatesare obtained for four segments 955 of each horizontal projection vector953.

FIGS. 15A-15B illustrate comparing the corresponding partial projectionvectors between corresponding partial areas of two images. Given lengthM horizontal projection vectors, and a search range of R pixels, thepartial vector 801 of length M-2R from the center of the projectionvector for image n-1 is compared to partial vectors from image n atvarious offsets 802, 803. The comparison yielding the best match ischosen as a best motion estimate in the respective direction. The bestmatch is defined as the offset yielding the minimum distance between thetwo vectors being compared. Common distance metrics include minimum meanabsolute error (MAE) and minimum mean squared error (MSE). In aparticular embodiment, the sum of absolute differences is used as thecost function to compare to partial vectors, and the comparison havinglowest cost is the best match.

The search for lowest cost offsets for each segment is conducted withsegments of the original projection vectors, simply checking the matchfor each offset in a given range (such as offsets −10, −9, −8, . . . −1,0, 1, 2, 3, . . . 8, 9, 10). This requires computing a given number ofMAE values, such as 21 in this example.

An approach that saves computing power is to conduct a two-stagehierarchical search. The simplest approach is to conduct a first-stagesearch with only a subset of offsets (such as −10, −8, −6, . . . 0, 2, .. . 8, 10). Once an offset is found that provides the best match in thesparse search, several additional offsets are checked around thatminimum to determine the precise offset resulting in the minimum cost.

After horizontal and vertical offsets for each segment and band aredetermined, further analysis of the motion estimates and costs allowsdiscrimination between still scenes and scenes with a high degree ofaction. The mean of the absolute values of the valid offset estimatesprovides one indication of scene activity. This indicator correlateswith global motion and camera motion. A second indicator of sceneactivity is the range of valid offset estimates, which correlates moreclosely with motion in portions of the scene. A third indicator used inthe particular embodiment is the average of the cost valuescorresponding to the valid offset estimates.

Integral projections can fail as a motion estimation technique undervarious conditions. For several conditions, failure can be mitigated byrequiring motion estimate components to exceed heuristicallypredetermined minimum value.

A failure condition can occur when the scene contains a repeatedpattern, such that multiple different motion estimates yield similarcosts. This case can be identified by ascertaining not only the bestmotion estimate, but also the second and third best motion estimates.Under normal circumstances, these three best motion estimates will beclustered together. If the difference between them is greater than apredetermined value, then a repeated pattern may be present. In thatcase, the motion estimate closest to zero can be selected. As analternative, the cost function, such as MAE, can be scaled by a simplefunction of magnitude of the motion estimate. The cost for each motionestimate is increased by a simple function of the magnitude of theestimate, such as the following equation:C _(m) =C×f×|O|In this equation, O is the (signed) offset or motion estimate, f is avalue that would typically range from 1.0 to 1.2, C is the usual costfunction, and C_(m) is the final modified cost function. This scalingprocess increases the cost value as the offset moves away from zero.Several local minima in the cost function will be scaled by differentvalues because they are located at different offsets. If there is onlyone global minimum, this scaling function has little effect, because thechange in cost scaling for a unit change in offset is slight.

Another condition that can cause motion estimation to fail is a portionof a scene having very little local contrast. In this case, all motionestimates have similar accuracy, and the best offset can be determinedincorrectly due to noise. This case can be identified by tracking theaverage cost of all motion estimates, as well as tracking the best cost.If the ratio between the average cost and the best cost is too small,that suggests a scene region with little content and the respectivemotion estimates are flagged as invalid.

A similar failure occurs when a smooth gradient exists in the scene. Inthis case, exposure and other differences can easily be confused withscene motion. To resolve this, checks can be made for changes in thesign of the first derivative in the projection vector segment from thecurrent image. Since taking a derivative is a noisy process and onlylarger changes are of interest, the projection vector segment values canbe scaled down prior to taking the first derivative. Projection vectorsegments that have too few changes in the first derivative can beomitted from the motion estimation.

Another situation that can result in integral projection failure isexposure change from one image to the next. This situation can beaddressed by summing the values in the integral projection vectors toobtain an overall intensity value for each vector. These values can beused to adjust the projection vectors prior to evaluating the variousmotion estimate offsets.

For the current purpose, discrimination between camera motion and motionwithin the scene is desirable but not critical. Clearly distinguishingbetween camera motion and motion within the scene allows for moreintelligent behavior when the user is panning the camera. Being able todiscount motion due to deliberate camera panning allows better analysisof motion of the main subject. For example, a capture of a race car withthe camera being held steady can be optimized slightly differently thancapture of the same race car when the user is carefully panning with therace car. In the second case, a longer exposure time would be in orderto emphasize the blur in the background. Camera motion can be detectedby use of one or more motion sensors.

At the same time, casual photographers rarely pan a camera in a highlycontrolled way. For these users, camera motion often correlates withhigh amounts of jitter and large amounts of motion within the scene.Thus, even limited intelligence that identifies significant motion,whether from camera motion or from scene motion, is of value inimproving most image capture scenarios.

Exposure analysis (AE) 122 is also performed. The objective of theanalysis is to estimate the optimum exposure for the main subject of theevaluation image. A variety of techniques are well known to those ofskill in the art. For example, a simple approach is to place the middleof an exposure range at the mean or median of a group of pixelscorresponding to the nearest subject detected by a rangefinder. Otherexposure analysis techniques average the brightness of differentportions of the scene with different weighting factors. The weightingfactors are based on secondary attributes such as pixel clipping, colorsaturation, proximity to edges in the scene, and other factors. Theexposure analysis is used to control the exposure of the next evaluationimage and to control the exposure of a final image.

White balance (AWB) analysis 123 is also performed. The objective ofthis analysis is to determine the best set of red, green, and bluebalance gains to provide an appropriate neutral balance for theevaluation image. A variety of techniques are also well known in theart. A simple technique computes adjustments of red, green, and bluegains of all of the pixels of the image to provide a neutral balance.Other techniques compute the color balance for different portions of thescene and compute an average balance for the overall scene usingweighing factors for each portion of the scene. The weighing factorsdepend on attributes of the image, such as lightness, color saturation,and proximity to detected edges in the scene.

Simple feature analysis 124 is also performed on the evaluation image.The simple feature analysis 124 complements the other (AE, AF, AWB,motion) analyses 110, 115, 122, 123 and, together with analyses 110,115, 122, 123 provides the first assessment. The simple feature analysis124 has moderate computational demands. It is preferred that the simplefeature analysis 124 and other analyses of the first assessment reachcompletion within the refresh frame time defined by a refresh of thecamera image display. In a particular embodiment, this time limit is 30milliseconds. For clarity, the analysis blocks 110, 115, 122, 123, and124 are shown separately. In fact, there are advantages to combiningaspects of the analyses and the precise functions can be mixed andcombined.

One example of simple feature analysis is skin detection (skin colorregion detection). The use of camera metadata alone, such as focusdistance, focal length and scene brightness, to identify portrait scenesresults in a high number of false positive portrait classifications.This happens when scenes do not contain portrait of people but arecaptured under conditions similar to those used for capturing portraits.For example, if an object such as a bookcase is captured from a distanceof about 1 meter, an algorithm based solely on focus distance, focallength and scene brightness is likely to classify the scene as aportrait due to the fact that the image capture parameter settings arelikely to resemble those used during the capture of a portrait. In thiscase, the scene is not a portrait.

The accuracy with which portrait scenes can be differentiated can beimproved if the presence of skin data in the scene is taken into accountduring scene classification along with other information, such asinformation provided by focus, exposure, and balance analysis. A scenecontaining one or more people that has been composed to include the headand shoulders, is likely to contain a significant proportion of skincontent. The presence of skin pixels in the scene can be used asindication that a portrait type scene is being captured. Any skindetection algorithm can be used to detect skin pixels duringcomposition. In a particular embodiment the skin detection method is themethod of ‘Bayesian decision rule for minimum cost’ Jones and Rehg,“Statistical Color Models with Application to Skin Detection”,International Journal of Computer Vision, vol. 46, no. 1, January 2002).

A pixel, x, is considered as skin if:$\frac{p\left( {x❘{skin}} \right)}{p\left( {x❘{nonskin}} \right)} \geq \tau$

where: x is a pixel color triple, preferably a YCC triple,

p(x|skin) is a 3D conditional probability density function of skin, and

p(x|nonskin) is a 3D conditional probability density function ofnon-skin. (A probability density function is also referred to herein asa “PDF”.) The variable τ is a predetermined skin detection threshold.For a pixel triple, x, the conditional PDF of skin, p(x|skin), returns avalue that describes the probability that x is a skin pixel. A largevalue indicates a high probability that x is a skin pixel and a smallvalue indicates a low probability that x is skin. Likewise, theconditional PDF of non-skin, P(x|nonskin) returns a value for x thatdescribes the likelihood that x is a non-skin pixel. A large valueindicates a high probability that x is any pixel other than skin and asmall value indicates a low probability that x is a non-skin pixel.

To determine a skin PDF, a 3D skin histogram can be predetermined usingknown (ground-truth) skin pixel data, preferably in the YCC color space,although any three-color space can be used. The ground-truth skin pixeldata can be generated manually by selecting skin pixels from imagescontaining skin data. If desired for greater accuracy, the images usedfor ground-truth data can be evaluation images from the same camera orsame type of camera. The skin histogram is converted to a skin PDF bydividing the value in each bin, by the maximum value in the histogram,although the total count in the skin histogram can be used. Likewise, anon-skin PDF can be determined from a 3D histogram of non-skin pixels.The non-skin histogram is converted to a non-skin PDF using the samemethod employed for the skin PDF. It is preferred that the skin andnon-skin PDF's are applied as 3D look-up tables (LUT's) with 32³ bins,although any other bin resolution can be used, such as 64³ or 128³.

To reduce the memory requirements associated with storing the skin andnon-skin LUT's in a digital camera, it is possible to combine the LUT'sinto a single 3D LUT where the bit depth of each element of the LUT is 8bits, although any other bit depth may be used. To combine the skin andnon-skin PDF LUT's, all bin values in the non-skin PDF LUT that are lessthan a predetermined threshold, such as 0.00061, are set equal to thatthreshold value, creating the PDF p(x|nonskin)′. Each value in the skinPDF LUT is divided by the value in the corresponding non-skin LUTaccording to the equation:${p\left( {x❘{cskin}} \right)} = \frac{p\left( {x❘{skin}} \right)}{{p\left( {x❘{nonskin}} \right)}^{\prime}}$For convenience in storage, the resulting PDF is quantized to 256levels. A pixel x can be considered as skin if:p(x|cskin)≧τwhere p(x|cskin) is the combined skin and non-skin PDF. A further savingin memory can be obtained if only non-zero values in the LUT are stored.Other techniques can be used to optimize storage and access techniques,such as storing only PDF values for a restricted range of index valuesspanning the nonzero entries in the PDF. Index values outside theseranges will always return zero probability values; only values withinthese ranges must be looked up.

Thresholding the ratio of skin to non-skin PDF's, or the combined skinPDF, results in a binary image (containing only 1 's and 0's). Pixelvalues in the binary image containing a 1 correspond to skin pixelswhile pixel values equal to 0 correspond to non-skin pixels. The skindetection threshold, τ, is selected such that the performance of theskin detector is optimized. Setting the threshold too low results in toomany skin pixels and setting it too high results in too few skin pixels.A skin receiver operating characteristic (ROC) curve can be used toselect an optimum skin threshold, τ. To generate a skin ROC curve, skindetection is applied to ground truth skin and non-skin pixel data. Theprobability of false positive (the fraction of pixels that weremistakenly classified as skin) is plotted against the probability oftrue positive (the fraction of pixels that were correctly classified asskin) for a range of skin threshold values, τ. The value of τ thatprovides between 80% and 90% true positive rate can be selected. A falsepositive rate between 10% and 20% is typically obtained. Preferably, tis selected from the point defined as the ‘equal error rate’ of the ROCcurve. This is where P_(falseRejection)=P_(falseDetection), whereP_(falseRejection)=1−P_(correctDetection).

Those skilled in the art will appreciate that once τ is selected, thePDF can be thresholded and stored in single bits. Alternatively, storingthe PDF with more precision enables adaptive adjustment of τ based onother analysis.

In checking for the presence of skin in an evaluation image, pixels arerun through a three-dimensional lookup table (3DLUT) that produces avalue indicating the probability of a pixel being a skin pixel, giventhe color of the pixel. The image can be preselected for this analysisby use of camera metadata, such as focus distance, focal length, andscene brightness. The resulting image produces using the 3DLUT shows theprobability of each pixel being a skin pixel. Counting the number ofpixels that have a skin probability over a predetermined thresholdproduces a feature that correlates with the probability a scene is aportrait. If the total number of skin pixels in the binary skin map isgreater than or equal to the predetermined threshold, then the scene isdetermined to be a portrait scene.

Alternatively, connected component analysis (described, for example, inHaralick, Robert M., and Linda G. Shapiro. Computer and Robot Vision,Volume I. Addison-Wesley, 1992. pp. 28-48), can be applied to the binaryskin map. The connected component analysis converts the binary image toa list of connected regions of pixels with the same value. In this case,the result is a list of connected regions of skin pixels. The largestconnected skin pixel region is selected and the number of pixels in theregion is found. If the number of pixels in the largest connected regionis greater than or equal to a predetermined threshold, then the scene isdetermined to be a portrait scene.

A higher rate of true positives is obtained if the method usingconnected region component analysis is used. This is due to the factthat large regions of connected skin pixels are more likely to exist inscenes containing people's faces. A large number of small connectedregions are unlikely to exist in portrait type scenes containing faces.The method of thresholding against the total number of skin pixels inthe skin map may result in more false positives than thresholdingagainst the size of the largest connected region.

An alternative method of integrating skin detection into theclassification of portrait scenes is to create a membership (weighting)function for the each scene type in which the parameter is total numberof skin pixels, or the total number of pixels in the largest connectedregion of skin pixels. The membership function can be determined fromthe statistics of skin pixel distributions in images taken from adatabase of scenes. For example, a database of images captured using thesame type of digital camera is formed. Images in the database aremanually classified into scene type. Skin detection is applied and thetotal number of skin pixels detected in each image is computed. Ahistogram that describes the frequency of scenes as a function of totalnumber of skin pixels is created for each scene type. A membershipweighting function can be determined simply by normalizing the frequencydistribution. The skin analysis statistics are combined with the imagemagnification, lens focal length, and scene brightness to compute anoverall degree of portrait-ness. If the scene is determined to be aportrait (having the highest probability among the available choices),then capture parameters are set for capturing a portrait scene.

To save computing resources, skin detection can be performed only ifother analysis (such as focus distance and focal length) indicates areasonable probability that a scene could be a portrait, savingcomputing resources. Alternatively, more accuracy in identifyingportrait scenes can be obtained if skin detection is always used and theskin-based probability is combined with portrait type probability basedon other measures.

The capture state defined for a portrait scene can include parameterssetting the exposure control system to use the widest aperture possibleand provide a low exposure index to minimize noise.

Another example of simple feature analysis is the preparation of ahistogram of scene colors and comparison of the histogram to one or morepredetermined color distributions that are characteristic of importantcapture scenarios, such as capture of a sunset. Another simple analysisis to prepare and analyze an exposure histogram to establish theexposure range of the image. This information can be used, in a mannerwell known to those of skill in the art, to determine if a flashexposure or use of fill flash would be warranted.

The analyses 110, 115, 122, 123, and 124 of the first assessment definea capture state, which may or may not be different than an initialdefault capture state assumed by the camera prior to the analyses. Thiscapture state is or can be used to capture more evaluation images. Thedefined capture state is redetermined at each iteration of theevaluation cycle and changes with changes in scene and exposureconditions. Camera settings can be changed at this time to match thedefined capture state or the change in camera settings can be delayeduntil needed.

Because of the time-critical nature of scenes with motion, theevaluation cycle of FIG. 8 includes a first decision point is todetermine whether the scene is an action scene. The detected motion iscompared to a predetermined motion threshold. The particular motionthreshold used is a function of the type of motion analysis performedand can be determined heuristically. If the motion analysis 115indicates motion in excess of the predetermined threshold, then thescene is determined to be an action scene and the parameters of thecapture state are set (135) for optimum capture of action. For example,the exposure control parameters are set to maintain a limited exposuretime, adjusting gain and aperture accordingly. The limit on exposuretime can be calculated from an estimate of the amount of motion in thescene. Alternatively, the limit on exposure time is allowed to vary as afunction of ambient light level and scene content. This allows abalancing of motion blur against image noise.

In a particular embodiment, capture settings have an exposure index thatis automatically calculated from the estimated scene brightness. Thiscan be implemented using a simple table that has estimated scenebrightness as the index variable and provides an exposure index orexposure value output. In this table, the exposure index decreases asthe scene brightness increases. For an action scene, a table with ahigher set of exposure index values is used. The rate at which exposureincreases as brightness drops provides a balancing of motion blur withnoise for a predetermined average scene.

In another embodiment, a capture setting is determined by selecting afixed exposure time and calculating aperture and exposure index using ashutter-priority exposure program. Exposure (aperture, exposure index,flash control, exposure time) control calculations can be performedahead of as needed. The intent here is to meet the simple objective oflimiting motion blur.

After setting capture parameters for optimum capture of motion, adetermination is made (155) as to whether the user has triggered captureof a final image. If triggered, final image capture is initiated (195).If final image capture has not been triggered, then complex featureanalysis (150) is begun. Complex feature analysis (150) also beginsfollowing a determination (130) that the scene is not an action scene.

Referring initially to FIG. 8, in embodiments illustrated by thisfigure, the complex feature analysis (150) considers additionalevaluation images and provides a second assessment. Complex featureanalysis (150) can also consider the first assessment. Each of theassessments can be based upon two or more evaluation images. Forconvenience, the discussion of FIG. 8, generally refers to a particularembodiment, in which the complex feature analysis considers a firstassessment based upon an initial set of evaluation images and a secondassessment based upon one additional evaluation image. Likeconsiderations apply to other embodiments. The complex feature analysisconsiders the same kind of analyses as earlier discussed: focus analysis110, motion analysis 115, exposure analysis 122, balance analysis 123,and simple feature analysis 124, but over the longer time interval ofthe initial evaluation images and the additional evaluation images.Complex feature analysis (150) can also include any analysis that takeslonger than a few milliseconds and, thus, does not fit into the simplefeature analysis (124).

During complex feature analysis (150), a determination (160) is madewhether capture (170) of an additional evaluation image is needed. Whenhighlights are determined to be significantly clipped or shadows aredetermined to be blocked up, an additional evaluation image at lower orhigher exposure is captured (170). For highlights, the additionalevaluation image is at a capture setting that provides a lower exposurelevel (such as ¼ the previous exposure). For blocking up of sceneshadows, an alternate evaluation image at higher exposure (such as 2 or4 times the previous exposure) is requested. This additional evaluationimage is considered in the continuing complex feature analysis (150)with the knowledge that the respective capture setting was deliberatelyat a lower or higher exposure relative to the other evaluation imagesunder analysis. If the darker or lighter additional evaluation image hasonly limited clipping of highlights or blocking of shadows, then theadditional evaluation image can be analyzed in relation to othercriteria, such as whether the scene has color characteristics of asunset. It is preferred that only one or two additional evaluationimages be captured, so that the displayed images on the digitalviewfinder or camera display do not become jerky or non-responsive tothe efforts of the user to compose the scene. For this reason, thecapture settings of the additional evaluation images are preferablyadjusted to maximize available information, as opposed to approximatingthe final capture setting.

The preview display 120 can be adjusted to compensate for the darker orlighter exposure, so that the user is presented visually consistentdisplay images. Some or all of the previous evaluation image can becarried over for display purposes, by compositing blocks or segments ofdifferent evaluation images to form each display image. Alternatively,evaluation images at different exposures can be captured and presentedon the display or the display can keep showing an earlier image in placeof an additional evaluation image. These approaches result in momentarydegradations of live digital viewfinding, but it is expected that suchdegradations would be acceptable to the user.

One simple determination of tonality accumulations that can be used, isbased on the cumulative histogram of the luminance channel of theevaluation image. A cumulative histogram having more than apredetermined percentage of pixels over or under a predeterminedhighlight threshold is considered to have highlight clipping or shadowblock up, respectively. In a particular embodiment, having 10 percent ormore of the pixels above a highlight threshold indicates highlightclipping and having more than 30 percent of the pixels below a shadowthreshold indicates shadows are blocked up.

After the complex feature analysis (150) is completed a decision is madethat no further additional evaluation images are needed and a finalcapture state is set (180) to provide an optimum capture of the finalimage of the scene.

FIG. 11 shows the processing flow for the complex feature analysis (150)of FIG. 8. Complex feature analysis starts at block (200). The firstanalysis (210) is to analyze for highlight clipping and blocking up inshadows. A simple way to perform this is to count the number of pixelsat or above a highlight threshold and those at or below a shadowthreshold. A histogram of the luma (Y) channel of a YCbCr evaluationimage makes this very efficient. This process is quite simple, andsuffices for most scenes. If processing power is available to produce arange map with adequate resolution, the range map is coupled withanalysis of which regions in the scene are clipped or blocked up. If therange map and other analysis suggests a clear main subject in themidtone region, then the significance of the clipped or blocked upregions is lessened.

Indicators for changes in scene exposure are then calculated (220). Thepurpose here is to determine whether the scene is changing inbrightness. If no change or a small change in scene brightness isdetected, it is assumed that the brightness will remain unchanged forthe time required to capture and analyze a evaluation image with thecamera in an alternative capture state. If a large change in scenebrightness is detected, it is assumed that the current capture state isinappropriate and that capture and analysis of another evaluation imageis needed to determine a new capture state. If a moderate change inscene brightness is detected, it is assumed that it is better to capturea final image with the camera in the current capture state than to delayfor the time necessary to capture and analyze another evaluation image.These assumptions have been determined to be practical for most consumerpicture-taking.

A simple calculation for scene exposure change is to compare the numberof highlight pixels, the number of shadow pixels, and the mean of allother pixels in the evaluation image with the same statistics from theprevious evaluation image. When making this comparison, any change incamera exposure (gain, aperture, integration time, etc) is considered soas to limit the determination to actual scene brightness differences. Asimple way to accomplish this is to use a lookup table to adjust thehistogram of the previous evaluation image for any change in cameraexposure and recalculate the highlight, midtone, and shadow statistics.This method has limited accuracy when large exposure changes are beingmade, but that accuracy is sufficient for the purposes here.

Referring again to FIG. 11, the next step is calculation (230) of scenechange response factors. This analysis can be used to tune adaptivetemporal filters to speed up response when the scene content is deemedto be changing, or to provide greater smoothing when scene content isstable. For example, exposure changes should be quick to respond whenthe scene composition is changing, yet should be damped when the scenecomposition is stable. FIG. 13 illustrates a scenario, in which suchanalysis is useful. Frame 510 outlines one possible capture compositionthat is largely a forest scene with a person in the foreground. Frame520 outlines another possible capture composition that is a sunset witha person in the foreground. Frame 530 outlines another possible capturecomposition that is largely a portrait with a forest background. As auser composes each capture, such as frame 520, the method provides thatmodest motions yield essentially stable balance, exposure, and focusbehavior. However, when the user shifts from one composition to another,such as frame 520 to 510, the method enables rapid adjustment of thecapture settings for each dramatically different composition. The sameoccurs with user composition of the scene using zoom (focal length)changes.

The determination of scene changes is based primarily on similarity offocus analysis 110, motion analysis 115, exposure analysis 122, balanceanalysis 123, and zoom from evaluation image to evaluation image. Forexample, small global motion estimates are consistent with normal camerajitter, while a larger range of motion estimates, with vectors going indifferent directions, indicates significant scene motion. Further, a setof motion vectors with similar values (and significant magnitude)indicates a deliberate user change of scene. This would be a pan in thecase of video; in the case of preview before a still capture, it issimply a change in composition. Small changes in the exposure histogramindicate minor scene changes that don't require balance or exposurechanges, while large changes indicate a need for rapid changes inexposure and balance. Other metrics, such as changes in edge maps fromimage to image, require more processing, but can provide more preciseindicators of what is changing from image to image. Those skilled in theart will appreciate that other metrics can be used, especially asavailable processing power increases.

After calculation of scene change response factors comes the decisionblock 240, testing whether the evaluation images indicate a moderateexposure change. If a moderate exposure change is found, then controlgoes to block 260 to compute depth of field and range indicators. Thisallows moderate changes in exposure to stabilize without taking time toobtain one or more additional evaluation images at alternate exposures.If the scene exposure is stable or a substantial change in exposure isfound, then the process proceeds to decision block 245 testing whetheran additional evaluation image at a different capture setting is neededto provide an alternate exposure. This decision is based on whetherthere is significant clipping of highlights or blocking up in shadows.If either of these is true, then the capture setting to provide thealternate exposure is calculated in block 250. The alternate exposure iseither much lower (if clipped highlights are more significant thanblocked up shadows) or much higher (if clipped highlights are lesssignificant than blocked up shadows).

The capture setting of the additional evaluation image is also basedupon a comparison (260) of depth of field and range information. Thisinvolves estimating the best focus distance (or range) for each ofmultiple regions in the composed scene, such as each region 90 of thescene 92 in FIG. 13. The focus distance, zoom position, and aperture ofthe current capture state allows computation of the nearest and farthestdistances for which scene content is sharply resolved. The range foreach region in the scene is compared to the depth of field delivered bythe optical system. For each region of the scene, it can either bewithin the depth of field of the optical system, too close, too far, orunknown.

If a significant number of regions are determined to be too close, toofar, or unknown, then the decision is made (270) to calculate analternate focus distance. If an alternate focus distance is needed, thedesired alternate focus distance is calculated (280). If the regionsthat are not in adequate focus are generally considered too far, then afarther focus distance is chosen, particularly one that provides a depthof field that overlaps with the depth of field in the current evaluationimage. If the regions that are not in adequate focus are generallyconsidered too close, then a close focus distance is chosen,particularly one that provides a depth of field that overlaps with thedepth of field in the current evaluation image.

Acquisition of the additional evaluation image at an alternate focusdistance allows the complex feature analysis to make better rangeestimates for different regions within the scene. The range is coupledwith cues derived from the exposure, balance, and other analysis toprovide a best selection of main subject location. This intelligentanalysis can provide a final capture state altering the depth of fieldto include all subject content. The depth of field is controlled byadjusting the aperture. For example, the depth of field may becontrolled to specifically include all faces in a scene, leavingbackground less in focus. Alternatively, the depth of field can beadjusted to include only the largest or most central face in the scene,leaving others less in focus. This depth of field approach, unlikestandard auto-focus systems, provides automatic depth of fieldadjustment.

In a camera with a switchable macro element and control 2, focusdistance will be controllable in at least two switchable ranges. Withtwo ranges of focus distances available, the estimated range data iscompared with both sets of focus distances and the macro control isswitched accordingly to accommodate the range of scene content.

After calculation of an alternate focus distance (if needed), controlpasses to block 290, and exit from FIG. 11. Upon exit from FIG. 11, flowreturns to FIG. 8.

It is preferred that the complex feature analysis (150) includes adetermination of subject and background that uses range data from therangefinder or from focus analysis for the different regions of thescene image. The criteria used for separating the different regions intosubject and background can vary, depending upon expected camera usage. Aconvenient criteria is that a region is background if the measureddistance for the region is greater than some multiple of the measureddistance of the nearest region and a region is subject if the measureddistance is less than or equal to that multiple of the measured distanceof the nearest region. A convenient multiple is two. Another convenientcriteria that can be applied by itself or in combination with the lastcriteria; it that a region is background if the measured distancecorresponds to the infinity distance for the taking lens. For example,with some lens settings, this distance is 12 feet or greater. Anothercriteria that can be applied by itself or with one or more othercriteria, is that outer regions of the image are background. Thiscriterion is most useful if applied as a counterpart to a determinationof close inner regions of the image. Another criteria is, if the flashunit has fired, then brighter regions or regions that are both brighterand closer represent the subject and other regions are background. Thiscriterion is conveniently used as a fallback when other distance-basedcriteria are ambiguous. Still another criteria is that if therangefinder detects only subject matter at the infinity distance, thenregions that are brighter or bluer or both are considered sky. Anadvantage of the criteria just mentioned is simplicity. Other, morecomplex criteria, such as pattern detection, can also be used.

In a particular embodiment, the method includes user input in thedetermination of a final capture state. This can be useful with sceneshaving too large an exposure range to capture in a single image and isconveniently provided as an option to fully automatic cameraself-determination. Following the complex image analysis, the user canbe offered a simple choice of capturing an image with more highlightinformation or more shadow information. Alternatively, the user can begiven these two choices and a third alternative, staying with the finalcapture state self-determined by the camera. Other variations, includingestimating several alternatives and only showing those that aresignificantly different, are also possible. The camera can display abest estimate of exposure in the main preview display and shows one ortwo small inset images previewing alternative (highlight or shadow)exposures. A simple selector switch or other input control can beprovided to allow the user to select one of the alternate exposures. Ifthe user decides to captures the image without selecting either of thealternate exposures, the default case is used. The alternative capturestates could be indicated by icons or the like, but this is notpreferred, since it is easier for the user to view the proposed results.Icons such as text or pictures can also be used in addition to images,as can an audible prompt.

Changes in depth of field and focus zone can be presented in a similarmanner. Areas of the presented evaluation images detected at variousdistances can be blurred or sharpened digitally to mimic opticalblurring and sharpening resulting from changes in lens focus and depthof field.

Other alternative capture states can be presented in the same manner.For example, if the camera detects a dark subject against a darkbackground that is out of flash range the camera can suggest two captureoptions: one for normal flash, depicted by lightening the subject only;and another for night portrait, depicted by lightening the subject andlightening the background to some degree. In night portrait, the subjectis mainly exposed by flash illumination, but the shutter remains openlong enough to provide an ambient light background exposure. Nightportrait mode is designed for situations in which a flash exposuresufficient for a foreground subject is insufficient to illuminate thebackground adequately. With night portrait mode, the subject is wellexposed by the flash against a visible background. In keeping with thegoal of intelligent simplicity, motion analysis can be coupled with thedistance and ambient light analysis. If the camera is being held quitesteady, it could automatically engage night portrait mode. If a modestdegree of motion is detected, insufficient to suggest a typical actionscene but enough to cause significant blur with a long exposure, thenthe camera can default to normal flash usage. Additional non-image datacan also be used to complement the image data in the above analyses.

In order to maintain a consistent rate for display refresh or by reasonof other processing constraints, the processing in step (150) can bepartitioned to execute in small increments, so a portion of the blockcan be executed every preview cycle (display of the next evaluationimage to the user). The complex feature analysis is completed overmultiple preview cycles. The additional cycles each include a new firstassessment of a new pair of initial evaluation images. Individualanalyses that are relatively slow, but only consider previously capturedevaluation images, can be made interruptible so as to execute overmultiple cycles. In this case, the analyses can also begin duringpreparation of the first assessment and can be completed during thecomplex feature analysis of the same cycle or a later cycle. Thisapproach can be used in other activities that are also utilizingprocessing resources. For example, activities such as compressing andwriting a video to storage and transmitting captured images over awireless network connection that can heavily load the processor, can beexecuted over multiple cycles.

Further complex processing can be included, subject to the constraintsalready discussed. Such processing can include use of adaptive tonescales, adaptive color processing, geometric corrections, or evenparticular special effects.

The assessments can also be used in determining post-capture processingof final images. Depending on the magnification and size of largestconnected region of skin pixels in the scene, the spatial processingcapture parameters can be adjusted to optimize sharpening for the image.For example, preferred sharpening for a close-up portrait issignificantly less than for a standard scene. Optimum sharpening andnoise reduction parameters can be determined by analyzing the texturesin the skin regions. Skin regions with very little texture suggestgreater sharpening can be applied, while skin regions with greatertexture suggest sharpening be minimized. More complex processing (suchas blemish concealment and expression enhancement) are optimized aswell, if the processing constraints in the camera can support morecomplex processing. For example, the evaluation image can be analyzed todetermine eye positions within the image and locate faces. This kind ofgeometric analysis allows both reliable detection of faces andestimation of face size, which helps in optimization of sharpening andother enhancements. This approach requires greater computing resources.

After the final capture state is set a check is made (190) as to whethercapture of the final image has been triggered. If final image capture isnot required, the evaluation preview process ends (198). If final imagecapture has been triggered, the final image is captured (195) andcontrol continues to the end (198). After end 198, the process returnsto the start (100). Final image capture (195) can be immediatelyfollowed by all necessary processing of the final image or the finalimage can be buffered for later processing.

FIG. 9 shows the overall decision flow used in another embodiment in adigital still camera. This embodiment differs from FIG. 8, in that thecomplex feature analysis (150) is limited to the initial evaluationimages and analysis of additional evaluation images is eliminated. Thisis illustrated in FIG. 12, in which the remaining steps of the complexfeature analysis correspond to like numbered steps earlier discussed inrelation to FIG. 8, with the exception that all steps are limited to theinitial evaluation images. The approach of FIG. 9 significantly reducesprocessing requirements and firmware complexity, but this embodiment isless able to optimize the final capture setting for scenes with broadexposure range or depth of subject matter.

FIG. 10 shows the overall decision flow used in still another embodimentin a digital still camera. This embodiment differs from FIG. 8, in thatmotion analysis is eliminated. The complex feature analysis of FIG. 10is that of FIG. 11. The approach of FIG. 10 reduces processingrequirements, but cannot identify and respond to action scenes.

The invention has been described in detail with particular reference tocertain particular embodiments thereof, but it will be understood thatvariations and modifications can be effected within the spirit and scopeof the invention.

1. A method for setting a camera for image capture, said methodcomprising the steps of: capturing an initial set of two or moreevaluation images; assessing a plurality of characteristics of saidinitial set of evaluation images to provide a first assessment, saidcharacteristics including subject motion between at least two of saidinitial set of evaluation images; when said subject motion is in excessof a predetermined threshold, setting a final capture state of saidcamera responsive to said first assessment; when said subject motion isless than said predetermined threshold: (a) further analyzing saidevaluation images to provide analysis results; and (b) setting saidfinal capture state of said camera responsive to said first assessmentand said analysis results.
 2. The method of claim 1 further comprisingwhen said subject motion is less than said predetermined threshold:presenting said analysis results to a user; and accepting user inputfollowing said presenting; wherein said final capture state isresponsive to said user input.
 3. The method of claim 1 furthercomprising when said subject motion is less than said predeterminedthreshold: capturing one or more additional evaluation images after saidcapturing of said initial set of evaluation images; determining saidcharacteristics of said additional one or more images to provide asecond assessment; and analyzing said second assessment to provideanalysis results; and wherein said final capture state is responsive toall of said analysis results.
 4. The method of claim 3 furthercomprising when said subject motion is less than said predeterminedthreshold: presenting said analysis results to a user; and acceptinguser input following said presenting; wherein said final capture stateis responsive to said user input.
 5. A method for setting a camera forimage capture, said method comprising the steps of: capturing an initialset of two or more evaluation images; assessing a plurality ofcharacteristics of said initial set of evaluation images to provide afirst assessment, said characteristics including subject motion betweenat least two of said initial set of evaluation images; when said subjectmotion is in excess of a predetermined threshold, setting a finalcapture state of said camera responsive to said first assessment; whensaid subject motion is less than said predetermined threshold: (a)capturing one or more additional evaluation images after said capturingof said initial set of evaluation images; (b) determining saidcharacteristics of said additional one or more images to provide asecond assessment; (c) analyzing both said assessments; and (d) settingsaid final capture state of said camera responsive to said analyzing. 6.The method of claim 5 further comprising displaying each of saidevaluation images to a user; wherein said determining and analyzing iscompleted following said displaying of said additional evaluationimages.
 7. The method of claim 5 wherein said assessing furthercomprises determining one or more additional characteristics saidinitial set of evaluation images, said determining being morecomputationally intensive than said assessing.
 8. The method of claim 7wherein said additional characteristics include differences in edgemaps.
 9. The method of claim 5 further comprising when said subjectmotion is less than said predetermined threshold: presenting results ofsaid analyzing to a user; and accepting user input following saidpresenting; wherein said final capture state is responsive to said userinput.
 10. The method of claim 5 further comprising: receiving a triggersignal during said assessing; and following the respective said setting,capturing one or more final images with said camera in the respectivesaid final capture state, responsive to said trigger signal.
 11. Themethod of claim 10 wherein said evaluation and final images are framesof a continuous video segment.
 12. The method of claim 10 wherein saidevaluation and final images are still digital images and said methodfurther comprises archiving said final images and deleting saidevaluation images, without user intervention.
 13. The method of claim 10wherein said initial and final capture states differ in values of one ormore of: focal length, focus distance, aperture, exposure time, andgain.
 14. The method of claim 5 wherein said setting is free of userintervention.
 15. The method of claim 5 wherein said characteristicsinclude one or more of: depth of field, color balance, and focus. 16.The method of claim 5 further comprising classifying said scene in oneof a plurality of predetermined classifications based on said analyzingto provide a scene classification, and wherein said capture state isresponsive to said scene classification.
 17. The method of claim 16wherein said assessing of said plurality of characteristics furthercomprises ascertaining, exposure range, focus, white balance, and skindetection.
 18. The method of claim 5 wherein said camera is in a defaultcapture state during said capturing of said initial set of evaluationimages and wherein each of said capture states includes settings of aplurality of: focal length, exposure time, focus distance, aperture,white balance adjustment, and flash state.
 19. A method for setting acamera for image capture, said method comprising the steps of: capturingan initial set of two or more evaluation images; assessing a pluralityof characteristics of said initial set of evaluation images to provide afirst assessment, said characteristics including subject motion vectorsbetween at least two of said initial set of evaluation images; when saidsubject motion vectors are less than a predetermined threshold,capturing one or more additional evaluation images; then, determiningsaid characteristics of said additional one or more images to provide asecond assessment; then, analyzing both said assessments; and then,setting said final capture state of said camera responsive to saidanalyzing.
 20. The method of claim 19 further comprising setting a finalcapture state of said camera responsive to said first assessment, whensaid one or more of said subject motion vectors are in excess of saidpredetermined threshold.
 21. The method of claim 19 further comprising:receiving a trigger signal during said assessing; and following therespective said setting, capturing one or more final images with saidcamera in the respective said final capture state, responsive to saidtrigger signal.
 22. A digital camera comprising: a capture unit settablein a plurality of different capture states, said capture unit beingactuable to capture a sequence of evaluation images of a scene andseparately trippable to capture one or more final images; a control unitoperatively connected to said capture unit, said control unit respondingwhen said capture unit is actuated to capture said evaluation images andtripped to capture said one or more final images, said respondingincluding: assessing a plurality of characteristics of an initial set oftwo or more of said evaluation images to provide a first assessment,said characteristics including subject motion between at least two ofsaid initial set of evaluation images; when said subject motion is inexcess of a predetermined threshold, setting a final capture state ofsaid camera responsive to said first assessment; when said subjectmotion is less than said predetermined threshold: (a) capturing one ormore additional evaluation images; (b) determining said characteristicsof said additional one or more images to provide a second assessment;(c) analyzing both said assessments; and (d) setting said final capturestate of said camera responsive to said analyzing.
 23. The camera ofclaim 22 wherein said responding is without user intervention additionalto actuating and tripping said capture unit.
 24. The camera of claim 22wherein each of said capture states includes settings of a plurality of:focal length, exposure time, focus distance, aperture, white balanceadjustment, and flash state.
 25. The camera of claim 22 wherein saidevaluation and final images are frames of a continuous video stream. 26.The camera of claim 22 further comprising memory operatively connectedto said control unit; and wherein said evaluation and final images arestill digital images and said control unit archives said final images insaid memory and deletes said evaluation images, without userintervention.